Programming with StarPU
نویسنده
چکیده
Modern platforms used for high-performance computing (HPC) include machines with both generalpurpose CPUs, and “accelerators”, often in the form of graphical processing units (GPUs). StarPU is a C library to exploit such platforms. It provides users with ways to define tasks to be executed on CPUs or GPUs, along with the dependencies among them, and by automatically scheduling them over all the available processing units. In doing so, it also relieves programmers from the need to know the underlying architecture details: it adapts to the available CPUs and GPUs, and automatically transfers data between main memory and GPUs as needed. While StarPU’s approach is successful at addressing run-time scheduling issues, being a C library makes for a poor and error-prone programming interface. This paper presents an effort started in 2011 to promote some of the concepts exported by the library as C language constructs, by means of an extension of the GCC compiler suite. Our main contribution is the design and implementation of language extensions that map to StarPU’s task programming paradigm. We argue that the proposed extensions make it easier to get started with StarPU, eliminate errors that can occur when using the C library, and help diagnose possible mistakes. We conclude on future work. Key-words: parallel programming, GPU, scheduling, programming language support ha l-0 08 07 03 3, v er si on 2 5 Ap r 2 01 3 Extensions du langage C pour la programmation hybride CPU/GPU avec StarPU Résumé : Les plateformes modernes utilisées en calcul intensif (HPC) incluent des machines comprenant à la fois des unités de traitement généralistes (CPU) et des “accélérateurs”, souvent sous la forme d’unités de traitement “graphiques” (GPU). StarPU est une bibliothèque C pour programmer sur ces plateformes. Elle fournit aux utilisateurs des moyens de définir des tâches pouvant s’exécuter aussi bien sur CPU que sur GPU, ainsi que les dépendances entre ces tâches, et s’occupe de les ordonnancer sur toutes les unités de traitement disponibles. Ce faisant, StarPU abstrait le programmeur des détails techniques sous-jacents: StarPU s’adapte aux unités de traitement disponibles et se charge de transférer les données entre elles quand cela est nécessaire. StarPU traite efficacement des problèmes d’ordonnacement, mais l’interface en langage C qu’elle propose est pauvre et facilite les erreurs de programmation. Cet article présente des travaux démarrés en 2011 pour promouvoir certains concepts exposés par la bibliothèque StarPU sous forme d’extensions du langage C, par le biais d’une extensions de la suite de compilateurs GCC. Notre principale contribution est la conception et la mise en œuvre d’extensions du langage C correspondant au paradigme de programmation par tâches de StarPU. Nous montrons que les extensions proposées facilitent la programmation avec StarPU, éliminent des erreurs de programmation pouvant intervenir lorsque la bibliothèque C est utilisée et aident le diagnostique de possibles erreurs. Nous concluons sur les travaux à venir. Mots-clés : programmation parallèle, GPU, ordonnancement, langage de programmation ha l-0 08 07 03 3, v er si on 2 5 Ap r 2 01 3 C Language Extensions for Hybrid CPU/GPU Programming with StarPU 3
منابع مشابه
Flexible Runtime Support for Efficient Skeleton Programming on Heterogeneous GPU-based Systems
SkePU is a skeleton programming framework for multicore CPU and multi-GPU systems. StarPU is a runtime system that provides dynamic scheduling and memory management support for heterogeneous, accelerator-based systems. We have implemented support for StarPU as a possible backend for SkePU while keeping the generic SkePU interface intact. The mapping of a SkePU skeleton call to one or more StarP...
متن کاملC Language Extensions for Hybrid CPU/GPU Programming with StarPU
Modern platforms used for high-performance computing (HPC) include machines with both generalpurpose CPUs, and “accelerators”, often in the form of graphical processing units (GPUs). StarPU is a C library to exploit such platforms. It provides users with ways to define tasks to be executed on CPUs or GPUs, along with the dependencies among them, and by automatically scheduling them over all the...
متن کاملStarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators
GPUs clusters are becoming widespread HPC platforms. Exploiting them is however challenging, as this requires two separate paradigms (MPI and CUDA or OpenCL) and careful load balancing due to node heterogeneity. Current paradigms usually either limit themselves to offload part of the computation and leave CPUs idle, or require static CPU/GPU work partitioning. We thus have previously proposed S...
متن کاملExploiting the Cell/BE Architecture with the StarPU Unified Runtime System
Core specialization is currently one of the most promising ways for designing power-efficient multicore chips. However, approaching the theoretical peak performance of such heterogeneous multicore architectures with specialized accelerators, is a complex issue. While substantial effort has been devoted to efficiently offloading parts of the computation, designing an execution model that unifies...
متن کاملFaster, Cheaper, Better – a Hybridization Methodology to Develop Linear Algebra Software for GPUs
3 4 for (n = 0 ; n < nt ; n++) // l oop on c o l s 5 for (m = 0 ; m < mt ; m++) // l oop on rows 6 s t a r pu ma t r i x da t a r e g i s t e r (& t i l e h a nd l e [m] [ n ] , 0 , 7 &t i l e [m] [ n ] , M, M, N, s izeof ( f loat ) ) ; Figure 3: Registration of the tiles as handles of matrix data type. Initialization. When initializing StarPU with starpu_init, StarPU automatically detects the ...
متن کامل